Annotation and Use of Speech Production Corpus for Building Language-Universal Speech Recognizers

نویسندگان

  • Jiping SUN
  • Xing JING
چکیده

A corpus linguistic study is reported in this paper, guided by articulatory phonology and by general phonetic principles of speech production. A direct application of this study is the construction of Hidden Markov Model topologies for automatic speech recognition, taking into account integrated multilingualism with the consideration of the common physiological organs and processes involved in the production of speech sounds from the world’s languages. We demonstrate in this study that incorporation of speech production principles can provide effective constraints on pronunciation modeling for the purpose of building language-universal speech recognizers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spontaneous Speech in the Spoken Dutch Corpus

In this paper the Spoken Dutch Corpus project is presented, a joint Flemish-Dutch undertaking aimed at the compilation and annotation of a corpus of 1,000 hours of spoken Dutch. Upon completion, the corpus will constitute a valuable resource for research in the fields of (computational) linguistics and language and speech technology. Although the corpus will contain a fair amount of read speech...

متن کامل

Corpus of Spoken Slovak Language

In this paper a short description of activities towards building a general speech corpus of spoken Slovak language is given. Different rôles and specific features of text corpus and speech corpus are investigated as well as the most frequent mistakes and misunderstandings of the concept of a speech corpus are mentioned. The concept of a big representative corpus of spoken language and its desir...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Exploring Pragmalinguistic and Sociopragmatic Variability in Speech Act Production of L2 Learners and Native Speakers

The pragmalinguistic and sociopragmatic aspects of language use vary across different situations, languages, and cultures. The separation of these two facets of language use can help to map out the socio-cultural norms and conventions as well as the linguistic forms and strategies that underlie the pragmatic performance of different language speakers in a variety of target language use situatio...

متن کامل

Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery

We present our approach to unsupervised training of speech recognizers. Our approach iteratively adjusts sound units that are ptimized for the acoustic domain of interest. We thus enable the use of speech recognizers for applications in speech domains here transcriptions do not exist. The resulting recognizer is a state-of-the-art recognizer on the optimized units. Specifically we ropose buildi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000